MARS: A Matching and Ranking System for XML Content and Structure Retrieval

نویسندگان

  • S. Alireza Aghili
  • Hua-Gang Li
  • Divyakant Agrawal
  • Amr El Abbadi
چکیده

Structural queries specify complex predicates on the content and the structure of the elements of tree-structured XML documents. Recent works have typically applied top-down decomposition of the twig patterns into (i) parent-child or ancestordescendant relationships, or (ii) path expression queries, and then followed by a join operation to reconstruct matched twig patterns. This demonstration system is the implementation prototype of an efficient heuristic-based bottom-up approach named MARS (Matching And Ranking System for xml structure queries), for matching and ranking of structural query patterns for XML query processing. An efficient nearest common ancestor labeling scheme is applied to utilize fast bottomup construction of the subtree matches from the potential keywords. MARS considers both the content and structure of queries and incorporates a variation of IR-based relevance ranking to report the top-k ranked results. The graphical user interface of MARS provides an interactive visualization of the twig query discovery. 1 Motivation of MARS Structured twig queries specify complex patterns of selection predicates on element labels (keyword search) and their corresponding inherent structural relationships. XML query languages [1, 2, 3, 4] and the underlying relational [5, 7] or native [6, 8] databases must provide efficient and effective support for querying both the content and the inherent structure of XML documents. However, the support for structural queries is usually facilitated by augmenting a layer of structural search on top of the traditional path expression query language performing the following general steps: (i) Decompose the twig query structure into its corresponding path expression queries. For instance, (/dblp/ /journal[author=’Serge’ AND title=’XML’]) gets decomposed into (/dblp//journal [author=’Serge’]) and ( /dblp//journal[title=’XML’]) path queries, (ii) Perform Twig Matching & Ranking 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

PIX: A System for Phrase Matching in XML Documents: A Demonstration

We present a system that enables flexible and efficient phrase matching in XML documents. Since XML allows structured and unstructured information to be interleaved, phrase matching in XML raises new challenges. Our system, named PIX, permits phrase matching in XML documents that contain “mixed content”. A key feature of PIX is that users can specify which element and content to ignore when mat...

متن کامل

Integration of IR into an XML Database

Structure matching has been the focus and strength of standard XML querying. However, textual content is still an essential component of XML data. It is therefore important to extend the standard XML database engine to allow for “Information Retrieval” style queries, namely, “keyword” based retrieval and “result ranking”. In this paper, we describe our effort in integrating information retrieva...

متن کامل

Enhancing Content-And-Structure Information Retrieval using a Native XML Database

Three approaches to content-and-structure XML retrieval are analysed in this paper: first by using Zettair, a fulltext information retrieval system; second by using eXist, a native XML database, and third by using a hybrid XML retrieval system that uses eXist to produce the final answers from likely relevant articles retrieved by Zettair. INEX 2003 content-and-structure topics can be classified...

متن کامل

Aggregated Feature Retrieval for MPEG-7

In this paper we present an initial study on the use of both high and low level MPEG-7 descriptions for video retrieval. A brief survey of current XML indexing techniques shows that an IR-based retrieval method provides a better foundation for retrieval as it satisfies important retrieval criteria such as content ranking and approximate matching. An aggregation technique for XML document retrie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005